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Birth interval is closely related to maternal and infant health. According to 
world health organization (WHO), the birth interval between two births is at 
least 33 months. This study is the first to discuss the short birth interval 
(SBI) in Indonesia and used data from the Indonesian Demographic and 
Health Surveys 2017 with a total of 34,200 respondents. Birth interval 
means the length of time between the birth of the first child and the second 
child. Categorized as SBI if the distance between births is less than 33 
months. The variables used include mother's age, mother's age at first giving 
birth, father's age, household wealth, succeeding birth interval, breastfeeding 
status, child sex, residence, mother's education, health insurance, mother's 
working status, contraception used, child alive, total children, number of 
living children, and household members. Machine learning algorithms 
including logistic regression, Naive Bayes, lazy locally weighted learning 
(LWL), and sequential minimal optimization (SMO) are applied to classify 
SBI. Based on the values of accuracy, precision, recall, F-score, matthews 


correlation coefficient (MCC), receiver operator characteristic (ROC) area, 
precision-recall curve (PRC) area, the Naive Bayes is the best algorithm with 
scores obtained 0.891, 0.889, 0.891, 0.885, 0.687, 0.972, and 0.960 
respectively. Additionally, 18.25% of mothers were classified as still giving 
birth within a short interval. 
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1. INTRODUCTION 

Birth interval is closely related to maternal and infant health [1]-[3]. Maternal and infant health is 
one of the important points in the sustainable development goals (SDGs), especially for good health and 
well-being [4], [5]. Although there is no clear research that short birth interval (SBI) can cause maternal and 
infant mortality deaths directly, the impact caused by short birth interval is likely to be detrimental. It can 
cause premature birth, especially for newborns [6], [7]. In addition, the short birth interval can also result in 
non-optimal nutrition for newborns, including receiving exclusive breastfeeding. Further impact, stunting, or 
wasting can occur. According to world health organization (WHO), the recommendation of a birth interval is 
33 months after the previous birth [8]—[10]. The right distance between births can help the mother recover, 
both physically and psychologically to deal with the next birth. So, the right birth distance can maintain 
maternal, perinatal, neonatal, and child health. 

Indonesia, as a developing country still experiencing problems in the field of health, specifically the 
health of mothers and children. In the Southeast Asian region, Indonesia occupies the highest position in 
maternal mortality [11], [12]. By 2010, the maternal mortality rate (MMR) value was recorded at 346 per 
100,000 live births. While related to the infant mortality rate (IMR), the value was 32 per 1,000 live births in 
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2012. Hence, the improvement of maternal and child health needs to be done. Therefore, studying the 
distance between births is the objective of this research. The other purpose of this study is to be able to find 
out a clear picture of birth spacing in Indonesia and want to know the factors that affect birth spacing. 

There have not been many studies on the birth interval in Indonesia. Short birth interval significantly 
affects neonatal mortality in Indonesia [13]. This study used 15,952 singleton live-born infants born from 
1997 to 2002 and implemented multilevel logistic regression (OR=2.82, p=0.00) as its method. Another 
study about birth interval found that area of residence, education of mother, and age of mother influence to a 
birth interval in West Papua and Yogyakarta, Indonesia [14]. The birth interval in this research means the 
interval between marriage and the time when the first child is born. Cox extended was used and the hazard 
ratios are 0.720, 1.708, 2.648, 4.361, and 0.955 for the area of residence, education of mother (finished 
primary school, junior high school, senior high school), and age of mother respectively. The other study 
about short birth intervals is conducted using secondary data from Indonesia Demographic and Health 
Surveys 2012 and found that shorth birth interval is one of the factors that cause maternal and infant 
mortality [15]. It studied birth spacing among multiparous women in Indonesia with Mann Whitney, Kruskal 
Wallis, and logistic regression as a methodology. The result concluded that 22.8% of women gave birth 
within less than 3 years of previous birth. A study on birth spacing and its relationship to infant mortality has 
been carried out and found that the two are negatively related [16]. This study implemented survival analysis, 
Cox proportional hazard, as its method. It concluded that babies born less than 36 months from the previous 
birth were more likely to die than babies born with a birth span of more than 36 months. 

From previous research, we have a result that short birth intervals still happen in Indonesia year by 
year and it can harm maternal and infant health. This study is conducted to determine the classification of 
short birth intervals in Indonesia. We design a new technique to classify them, i.e. machine learning 
algorithms to know the composition between normal and short birth intervals. We use data from the 
Indonesia Demographic and Health Survey 2017. On top of that, this study wants to know the prevalence of 
mother and child adverse health in Indonesia through the birth interval. 


2. METHOD 

This study used secondary data, namely the Indonesian Demographic and Health Surveys 2017. 
Data was collected and maintained by the National Population and Family Planning Board, Statistics 
Indonesia, Ministry of Health, in collaboration with intermunicipal collaboration framework (ICF) under the 
Demographic and Health Surveys (DHS) program [17]. Survey data is an annual program, which previously 
was held in 1987, 1991, 1994, 1997, 2002-2003, 2007, and 2012. The samples were women of childbearing 
age 15—49 years old in 34 provinces in Indonesia. 

The sample of respondents included 34,200 women. Nevertheless, the number of samples included 
in the SBI category was 8,286 (24.23%). Categorized as SBI if the birth interval is less than 33 months [10]. 
And the rest, women presumed as normal in the birth interval. Several variables used include mother's age, 
mother's age at first giving birth, father's age, household wealth, succeeding birth interval, breastfeeding 
status, child sex, residence, mother's education, mother's working status, contraception used, child alive, total 
children, number of living children, household members, and health insurance [18]-[22]. 

We started by preprocessing the raw data. In the first step, we select variables that correspond to the 
interval of childbirth. Then, check for the completeness of data. In the next step, we normalize the continuous 
attribute. Afterward, encode the short birth interval attribute into 1 and 0. Categorize as an SBI if birth 
interval greater than 0 months but less than 33 months, otherwise categorize as not SBI. Then, split data 
using 10-fold cross-validation. 

In this study, we implemented various algorithms for classification. Logistic regression, Naïve 
Bayes, sequential minimal optimization (SMO), and lazy locally weighted learning (LWL) were used to 
classify SBI [23]-[26]. Logistic regression is a classic statistical method, as well as a common algorithm in 
classifying binary class, yet it has drawbacks when applied to build in complex multivariable nonlinear 
relationships [24], [27]-[29]. Naive Bayes is an algorithm that remains the most effective and efficient in 
classification tasks. This algorithm offers simplicity and less computational runtime [24], [25], [28]. SMO is 
an effective method for training the support vector machines (SVMs), especially on the sparse dataset. This 
method was chosen because, in Indonesian Demographic and Health Surveys 2017 dataset, the target variable 
is mostly equal to the zero values. Then, LWL has the advantage that it is extremely adaptable and provides a 
precise model in the long run [30]. The selection of the best model is done by looking at several criteria for 
the goodness of the model, namely: matthews correlation coefficient (MCC), area under the receiver operator 
characteristic (ROC) curve area under the curve (AUC), precision-recall curve (PRC) area, accuracy, 
precision, recall, Fl-score [31]-[33]. 
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3. RESULTS AND DISCUSSION 

Preprocessing omitted 2 observations because of incomplete information. So that the total 
observation was 34,198 women. By this number, 75.77% of mothers were classified as not short birth 
interval or they gave a normal interval of birth, and the rest 24.23% were categorized as a short birth interval. 
A short birth interval means that the interval between birth is less than 33 months. 

Table 1 illustrates some of the main characteristics of the respondents sociodemographic. More than 
50% of mothers as respondents were secondary education level (51.89%), followed by primary (32.44%) 
with the highest birth interval which on average is 47.02 months. Mothers with higher education levels 
(13.48%) have the lowest average birth interval i.e. 30.97 months. This result is similar to the study by [14], 
[15] and equal to the report by [2] which stated that women in higher education levels have a higher risk of 
giving short birth intervals than lower education levels. Many factors affect this condition, such as age and 
female fertility. Women who graduate from higher education tend to graduate at an older age than women 
who are not highly educated. In addition, related to fertility, the older a woman is, the less her fertility is. 
Therefore, it is not surprising that women with higher education tend to give birth with a shorter duration. 

Of the total respondents, more than half of them live in urban areas (50.47%) with an average birth 
interval of 40.94 months. This interval is slightly longer than the average birth interval for women living in 
rural areas. This output is consistent with research that stated that women who gave birth in cities tend to 
have longer intervals [13], [14]. 


Table 1. Distribution of respondents' sociodemographic characteristics based on the average birth interval 


n (%) Average of birth interval 
N = 34,198 (in months) 
Mother’s education 
No education 752 (2.20) 37.70 
Primary 11,093 (32.44) 47.02 
Secondary 17,744 (51.89) 39.72 
Higher 4,609 (13.48) 30.97 
Residence 
Rural 16,939 (49.53) 40.79 
Urban 17,259 (50.47) 40.94 
Household wealth 
Poorest 8,062 (23.57) 36.21 
Poorer 6,672 (19.51) 41.43 
Middle 6,568 (19.21) 42.87 
Richer 6,559 (19.18) 43.09 
Richest 6,337 (18.53) 41.82 
Contraception used 
No method 13,248 (38.74) 33.13 
Traditional method 2,367 (6.92) 39.64 
Modern method 18,583 (54.34) 46.54 
Mother’s working status 
No 14,219 (41.58) 39.96 
Yes 19,979 (58.42) 41.51 
Child alive 
No 2,138 (6.25) 30.38 
Yes 32,060 (93.75) 41.56 


Based on the household wealth variable, more respondents are in the poorest category (23.57%) 
while other wealth categories tend to be balanced with a percentage of 18-19%. The condition of the mother 
with the poorest economic level also resulted in the short distance between births, i.e. 36.21 months. 
Meanwhile, for the other economic level categories, the birth intervals are similar which is between 
41-43 months. Additionally, the richer economic level has the longest birth interval, i.e. 43.09 months. These 
results are in line with research conducted by [2], [15], [16], [34] with the consideration that the better the 
economic level of a family, the better access to health and knowledge. 

Respondents mostly used modern methods (54.34%) as contraceptive methods, followed by no 
methods (38.74%) and traditional methods (6.92%). The use of contraception affects the interval between 
births where mothers with no method have the shortest interval of 33.13 months on average and vice versa 
for mothers with modern contraceptives, the longest interval is 46.54 months. This can happen because the 
effects of using contraceptives can delay the time until the next impregnation [2], [15]. 

Employment status has little effect on birth interval. It can be seen from the slightly different 
interval between working mothers (58.42%) and not working (41.58%). The birth interval for working 
mothers is 41.51 months, while for non-working mothers is 39.96 months. The birth interval for working 
mothers tends to be longer and this is normal [14]. 
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Respondents who have had children but died have a much shorter birth interval than respondents 
whose children are still alive. From this study, it can be seen that for mothers who have given birth to 
children and died, the birth interval is 30.38 months, while mothers who give birth to children with live child 
status are 41.56 months. This is in line with research conducted by Kurniawati in which she explained the 
existence of a replacement effect. This effect results in mothers whose children die, they will immediately 
look for a replacement, namely by getting pregnant again [15]. 

Several sociodemographic characteristics as previously described indicate a correlation of several 
factors that affect birth spacing. Figure 1 shows a map of the distribution of the median birth distance for all 
provinces in Indonesia. It is seen that the darker the gradient, the longer the birth interval. Most of the eastern 
part of Indonesia has a short interval compared to the central and western parts of Indonesia. Several 
provinces included in the short birth interval category are Papua, West Papua, West Sulawesi, South 
Sulawesi, Maluku, East Nusa Tenggara, and North Sumatra. Many factors can cause these provinces to fall 
into the category of short birth intervals. One of them is the geographical condition where the provinces are 
far from the national capital. Especially Papua, in which access to health, education, and the economy are 
also still difficult. In line with the research by Hidayat, the research also focuses on birth intervals in the 
Papua region [14]. 


Median Birth Interval of Indonesia 
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Figure 1. Distribution map of the median birth interval (in months) in Indonesia 


Based on sociodemographic results, to get a normal birth interval, mothers need to get an education, 
and a fairly decent economy. In addition, the use of contraception also needs to be given. Regarding the 
status of working mothers, this can be returned to each mother considering that this is an option that can be 
discussed with the family. Mothers who lived in urban areas have better access to health, education, and the 
economy compared to mothers who live in rural areas. Likewise, correlate to the province where the mother 
lives, this requires cooperation with the government considering the equality of education, health, and 
economy that is received by all Indonesian people no matter where they lived. 

Besides sociodemographic aspects, we classified short birth intervals using 4 machine learning 
algorithms, i.e. logistic regression, Naïve Bayes, SMO, and lazy LWL. The results of the goodness of the SBI 
classification model are shown in Table 2, Figure 2, and Figure 3. Table 2 presents the values of accuracy, 
precision, recall, and F-score for each algorithm. From these 4 score metrics, the values obtained are not too 
much different between algorithms. Naïve Bayes got the highest score for these 4 score metrics, with 
accuracy, precision, recall, F-score values of 0.891, 0.889, 0.891, and 0.885, respectively. On the other hand, 
the performance of the lazy LWL algorithm is not good enough seen from the lowest score metrics, namely 
for accuracy, recall, F-score, which are 0.854, 0.854, and 0.832, respectively. 

In addition, to compare the performance of 4 algorithms from 4 score metrics, we measured the 
value of MCC, PRC area, and ROC area. The MCC results can be seen in Figure 2. The Naive Bayes 
algorithm gave the highest MCC (0.687), whereas lazy LWL gave the lowest MCC (0.571). Overall, the 
order of classifiers with the best MCC is naive Bayes, logistic regression, SMO, and lazy LWL. The result of 
the MCC is smaller than the previous performance metrics scores. It could be because MCC is suitable for 
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imbalanced data in binary classification tasks. And on the other hand, MCC is more robust compared to other 
performance metrics. This is in line with the study that stated that MCC is way better to find the best 


performance of binary classification using a machine learning algorithm compared to accuracy and F1 
score [33], [35]. 


Table 2. Accuracy, precision, recall, F-score for the four algorithms 


Algorithms Accuracy _ Precision Recall _ F-Score 
Logistic regression 0.862 0.858 0.862 0.854 
Naive bayes 0.891 0.889 0.891 0.885 
SMO 0.860 0.855 0.860 0.852 
Lazy LWL 0.854 0.867 0.854 0.832 


MATTHEWS CORRELATION 


Logistics Naive bayes lazy LWL 
regression 


Figure 2. MCC for the four algorithms 


The other two performance scores are PRC and ROC area which are shown in Figure 3. Of the 4 
algorithms, Naive Bayes gives the best performance with PRC and ROC area values are 0.960 and 0.972. 
Followed by lazy LWL with its area of PRC and ROC are 0.958 and 0.954. Then, the third-best algorithm is 
logistic regression with the area of PRC and ROC, i.e. 0.914 and 0.897. On the other hand, the SMO 
algorithm gets the worst performance with PRC and ROC area values of 0.793, 0.762 respectively. Following 
7 performance metrics, we get the results that Naive Bayes is the best classifier in classifying short birth 
intervals in the Indonesia DHS 2017 dataset. These results are in line with studies comparing several machine 
learning algorithms to predict individual survival. This study compares logistic regression, Naive Bayes, and 
random forest and finds that Naive Bayes is the best algorithm [36]. Furthermore, in this binary 
classification, we got the result that lazy LWL mostly performed the worst. This can happen because lazy 
LWL is suitable for the data stream. This has been revealed by research conducted for the detection of real- 
world network intrusion. The results showed that lazy LWL is good for data streams or big data [30]. 
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Figure 3. ROC and PRC area for the four algorithms 


The confusion matrix is used to clarify the classification results for each category. Table 3 provides 
the confusion matrix for the 4 algorithms. We compared the actual number of events with the results of each 
classifier. The first classifier is Logistic regression which corrected classify SBI as much as 13.73%, 
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corrected classify as not SBI is 72.51%, and incorrect classification equal to 13.76%. Continued to the Naive 
Bayes classifier, the mothers who were classified as having children in an SBI were 15.79% and mothers 
who were correctly classified as not giving birth to children in close intervals were 73.31%. On the other 
hand, there is 10.90% for misclassification. SMO classify correctly to mothers who categorized as SBI as 
many as 13.88%, correctly classified as not SBI equal to 72.14%, and wrong classified 13.98%. The last 
classifier is lazy LWL which classify mothers categorized as SBI correctly is 10.29%, correctly classified as 
not SBI 75.11%, and misclassified equal to 14.60%. 


Table 3. The number of actual and predicted outcomes 
Status 


Classifier Classification 


SBI 


Not SBI 


Logistic regression 
Naive bayes 
SMO 


Lazy LWL 


Actual events number 
Correctly predicted outcome 
Incorrectly predicted outcome 
Correctly predicted outcome 
Incorrectly predicted outcome 
Correctly predicted outcome 
Incorrectly predicted outcome 
Correctly predicted outcome 
Incorrectly predicted outcome 


8,286 (24.23%) 
4,696 (13.73%) 
1,114 (3.26%) 
5,399 (15.79%) 
841 (2.46%) 
4,747 (13.88%) 
1,242 (3.63%) 
3,520 (10.29%) 
226 (0.66%) 


25,912 (75.77%) 
24,798 (72.51%) 
3,590 (10.50%) 
25,071 (73.31%) 
2,887 (8.44%) 
24,670 (72.14%) 
3,539 (10.35%) 
25,686 (75.11%) 
4,766 (13.94%) 


Through the confusion matrix, we can see the number of events classified using four classifiers. 
Each classifier gave different results to each other as well as different to the actual number of events. Naive 
Bayes gave the closest result in classification the respondents similar with results of previous metrics score. 
Naive Bayes gives the results that mothers who were classified as having children in an SBI were 5.399 or 
15.79% and mothers who were correctly classified as not giving birth to children in close intervals were 
25,071 or 73.31%. On the other hand, 2.887 or 8.44% of mothers who were wrongly classified as not giving 
birth to children in short birth intervals, while 841 or 2.46% of mothers who were misclassified gave birth to 
children in short birth intervals. Based on these results, we can provide input to the government, prospective 
parents, or parents who are planning to have more children to consider the distance between births. This is 
because, birth spacing that is too close will adverse to the mother, perinatal, neonatal, and child. In addition, 
the recommendation from WHO which states that the interval between births is at least 33 months can be 
considered. 


4. CONCLUSION 

The best classifier for classifying short birth intervals in Indonesia using the 2017 Indonesia 
Demographic and Health Survey data is Naive Bayes. From this classifier, it was found that for mothers 
classified as SBI as many as 15.79%, while those who were not classified as SBI were 73.31%. Although the 
results of the classification in the SBI are low, this value is quite large when considering the number of 
women of productive age (15—44 years) as many as 6,240 mothers who are classified into the category of 
short birth interval. If this continues, the health of the mother and newborn as well as siblings who have been 
born earlier can be compromised. Therefore, the role of the government is very much needed to make the 
right policy to obtain a normal birth interval according to WHO standards. In addition, the government also 
plays a role in equal distribution of education, health, and the economy so that mothers all around Indonesia 
have equal access to those facilities. In addition, knowledge about family health is needed, especially 
maternal and child health for prospective parents and parents who want to have children in the future. 
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